Stochastic models for document restructuration

نویسندگان

  • Patrick Gallinari
  • Guillaume Wisniewski
  • Ludovic Denoyer
  • Francis Maes
چکیده

Document (re)structuration consists in mapping documents coming from different sources, with different formats, onto a predefined semistructured format. This generic problem appears in different applications settings like heterogeneous semi-structured databases querying, peer to peer systems, legacy document conversion, XML information retrieval. In the paper, we define the restructuration problem from a document centric perspective and identify the main problems raised by this new problematic. We then consider two restructuration instances: structuring flat documents and learning the correspondence between structured formats. We propose stochastic models for these two tasks and describe tests on a large XML document collection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Restructuration automatique de documents dans les corpus semi-structurés hétérogènes

Résumé. L’interrogation de grandes bases de documents semi-structurés (type XML) est un problème ouvert important. En effet, pour interroger un document dont le schéma est nouveau, un système doit pouvoir soit adapter la requête posée au document, soit adapter le document pour pouvoir lui appliquer la requête. Nous nous positionnons ici dans le cadre de la restructuration de documents qui consi...

متن کامل

Modèle probabiliste pour l'extraction de structures dans les documents semistructurés - Application aux documents Web

With content management system becoming mainstream the Web has changed dramatically: more and more web pages are now generated from relational databases and their design reflects the logical structure of documents. In this work, we show that there is enough information in the layout of a web document to capture the kind of data people are already producing in a more machine-friendly format. The...

متن کامل

Méthode de formation et de restructuration dynamique de coalitions d'agents fondée sur l'optimum de Pareto

This paper presents a coalition formation protocol for multi-agent systems, which find a Pareto optimal solution without any agent’s preferences aggregation. We present an extension of this protocol allowing dynamic restructuration for coalitions. We present behavior’s model for agents, which are well adapted for our coalition formation protocol. An application based on teaching scheduling has ...

متن کامل

Behavioral study of piston manufacturing plant through stochastic models

Piston plays a vital role in almost all types of vehicles. The present study discusses the behavioral study of a piston manufacturing plant. Manufacturing plants are complex repairable systems and therefore, it is difficult to evaluate the performance of a piston manufacturing plant using stochastic models. The stochastic model is an efficient performance evaluator for repairable systems. In...

متن کامل

Application of Stochastic Optimal Control, Game Theory and Information Fusion for Cyber Defense Modelling

The present paper addresses an effective cyber defense model by applying information fusion based game theoretical approaches‎. ‎In the present paper, we are trying to improve previous models by applying stochastic optimal control and robust optimization techniques‎. ‎Jump processes are applied to model different and complex situations in cyber games‎. ‎Applying jump processes we propose some m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005